42 research outputs found
Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion Bridge
With the demand for autonomous control and personalized speech generation,
the style control and transfer in Text-to-Speech (TTS) is becoming more and
more important. In this paper, we propose a new TTS system that can perform
style transfer with interpretability and high fidelity. Firstly, we design a
TTS system that combines variational autoencoder (VAE) and diffusion refiner to
get refined mel-spectrograms. Specifically, a two-stage and a one-stage system
are designed respectively, to improve the audio quality and the performance of
style transfer. Secondly, a diffusion bridge of quantized VAE is designed to
efficiently learn complex discrete style representations and improve the
performance of style transfer. To have a better ability of style transfer, we
introduce ControlVAE to improve the reconstruction quality and have good
interpretability simultaneously. Experiments on LibriTTS dataset demonstrate
that our method is more effective than baseline models.Comment: Accepted at Interspeech202
A Chunk-Based Reordering Model for Phrase-Based SMT Systems
This paper proposed a novel reordering model based on the reordering of source language chunks. This model is used as a preprocessing step of phrase-based translation models and could be well integrated with them. At the same time, as a chunk-based model, syntax information could be concerned in the process of reordering while the entire parsing of the source sentence is not required. Two experiments were carried out and the results showed that the proposed model could improve the performance of a phrase-based statistical machine translation (SMT) system greatly
Translation memory sharing models in XMCAT
In this paper, two Translation Memory (TM) sharing models adopted in XMCAT, a Computer Assisted Translation tool (CAT) supporting cooperated work in machine translation, was described in detail. One is Center-based TM sharing model, which is only fit for users in a local area network (LAN) and the other is a novel model called P2P-based TM sharing model, which could be used through Internet by geographically distributed users. With the two TM sharing models, a user may share data with other users through network, so that he/she may reduce the repeated work further,and cooperate with others more easily. Besides, the methods used in XMCAT to deal with the problem of multi-translations arose in the cooperated memory sharing models, were also proposed in this paper. XMCAT system has been adopted and approved by some translation companies